Lighter: fast and memory-efficient error correction without counting
نویسندگان
چکیده
Correspondence: [email protected] Department of Computer Science, Johns Hopkins University, 21218, Baltimore, USA Full list of author information is available at the end of the article Abstract Lighter is a fast, memory-efficient tool for correcting sequencing errors. Lighter avoids counting k-mers. Instead, it uses a pair of Bloom filters, one holding a sample of the input k-mers and the other holding k-mers likely to be correct. As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy. Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy.
منابع مشابه
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
MOTIVATION Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be...
متن کاملSupplementary material for “Lighter: fast and memory-efficient error correction without counting”
For a standard Bloom filter, each of the h hash functions could map item o to any element of the bit array. The bit array will often be very large, much larger than the processor cache. Thus, each probe into the bit array is likely to cause a cache miss. Putze et al [5] propose a blocked Bloom filter. Given a block size b, the first hash function H0(o) is used to select a size-b block of consec...
متن کاملMSPKmerCounter: A Fast and Memory Efficient Approach for K-mer Counting
Motivation: A major challenge in next-generation genome sequencing (NGS) is to assemble massive overlapping short reads that are randomly sampled from DNA fragments. To complete assembling, one needs to finish a fundamental task in many leading assembly algorithms: counting the number of occurrences of k-mers (length-k substrings in sequences). The counting results are critical for many compone...
متن کاملOptimal fast digital error correction method of pipelined analog to digital converter with DLMS algorithm
In this paper, convergence rate of digital error correction algorithm in correction of capacitor mismatch error and finite and nonlinear gain of Op-Amp has increased significantly by the use of DLMS, an evolutionary search algorithm. To this end, a 16-bit pipelined analog to digital converter was modeled. The obtained digital model is a FIR filter with 16 adjustable weights. To adjust weights o...
متن کاملSqueakr: an exact and approximate k-mer counting system
Motivation k-mer-based algorithms have become increasingly popular in the processing of high-throughput sequencing data. These algorithms span the gamut of the analysis pipeline from k-mer counting (e.g. for estimating assembly parameters), to error correction, genome and transcriptome assembly, and even transcript quantification. Yet, these tasks often use very different k-mer representations ...
متن کامل